Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning
نویسندگان
چکیده
Abstract Mungojerrie is an extensible tool that provides a framework to translate linear-time objectives into reward for reinforcement learning (RL). The convergent RL algorithms stochastic games, reference implementations of existing translations $$\omega $$ ω -regular objectives, and internal probabilistic model checker objectives. This functionality modular operates on shared data structures, which enables fast development new translation techniques. supports finite models specified in PRISM -automata the HOA format, with integrated command line interface external linear temporal logic translators. distributed set benchmarks RL.
منابع مشابه
Reinforcement Learning: Model-free
Simply put, reinforcement learning (RL) is a term used to indicate a large family of dierent algorithms RL that all share two key properties. First, the objective of RL is to learn appropriate behavior through trialand-error experience in a task. Second, in RL, the feedback available to the learning agent is restricted to a reward signal that indicates how well the agent is behaving, but does ...
متن کاملMultitask model-free reinforcement learning
Conventional model-free reinforcement learning algorithms are limited to performing only one task, such as navigating to a single goal location in a maze, or reaching one goal state in the Tower of Hanoi block manipulation problem. It has been thought that only model-based algorithms could perform goal-directed actions, optimally adapting to new reward structures in the environment. In this wor...
متن کاملModel-Free Trajectory Optimization for Reinforcement Learning
Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-F...
متن کاملModel-Free Preference-Based Reinforcement Learning
Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tuning from a human expert. In contrast, preference-based reinforcement learning (PBRL) utilizes only pairwise comparisons between trajectories as a feedback signal, which are often more intuitive to specify. Currently available approaches to PBRL for control problems with continuous state/action sp...
متن کاملLearning Epistemic Actions in Model-Free Memory-Free Reinforcement Learning: Experiments with a Neuro-robotic Model
Passive sensory processing is often insufficient to guide biological organisms in complex environments. Rather, behaviourally relevant information can be accessed by performing so-called epistemic actions that explicitly aim at unveiling hidden information. However, it is still unclear how an autonomous agent can learn epistemic actions and how it can use them adaptively. In this work, we propo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2023
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-30823-9_27